Stochastic blockmodels and community structure in networks 
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Stochastic blockmodels have been proposed as a tool for detecting community structure in net- 
works as well as for generating synthetic networks for use as benchmarks. Most blockmodels, 
however, ignore variation in vertex degree, making them unsuitable for applications to real-world 
networks, which typically display broad degree distributions that can significantly distort the results. 
Here we demonstrate how the generalization of blockmodels to incorporate this missing element leads 
to an improved objective function for community detection in complex networks. We also propose a 
heuristic algorithm for community detection using this objective function or its non-degree-corrected 
counterpart and show that the degree-corrected version dramatically outperforms the uncorrected 
one in both real-world and synthetic networks. 



I. INTRODUCTION 

A stochastic blockmodel is a generative model for 
blocks, groups, or communities in networks. Stochastic 
blockmodels fall in the general class of random graph 
models and have a long tradition of study in the so- 
cial sciences and computer science I n the simplest 
stochastic blockmodel (many more complicated variants 
are possible), each of n vertices is assigned to one of K 
blocks, groups, or communities, and undirected edges are 
placed independently between vertex pairs with probabil- 
ities that are a function only of the group memberships 
of the vertices. If we denote by the group to which 
vertex i belongs, then we can define a K x K matrix ip 
of probabilities such that the matrix element yjg igj is the 
probability of an edge between vertices i and j. 

While simple to describe, this model can produce a 
wide variety of different network structures. For exam- 
ple, a diagonal probability matrix would produce net- 
works with disconnected components, while the addi- 
tion of small off-diagonal elements would generate con- 
ventional "community structure" — a set of communities 
with dense internal connections and sparse external ones. 
Other choices of probability matrix can generate core- 
periphery, hierarchical, or multipartite structures, among 
others. This versatility, combined with analytic tractabil- 
ity, has made the blockmodel a popular tool in a number 
of contexts. For instance, the planted partition model @, 
which is equivalent to the model above with a specific 
parametrization of the matrix ip, is widely used as a the- 
oretical testbed for gjraph partitioning and community 
detection algorithms [3, [8j . 

Another important application, and the one that is 
the primary focus of this paper, is the fitting of block- 
models to empirical network data as a way of discov- 
ering block structure, an approach referred to in the so- 
cial networks literature as a posteriori blockmodeling [J . 
A number of ways of performing the fitting have been 
suggested, including some that make use of techniques 
from physics 0, fTol j . A posteriori blockmodeling can be 
thought of as a method for community structure detec- 
tion in networks though blockmodeling is consider- 



ably more general than traditional community detection 
methods, since it can detect many forms of structure in 
addition to simple communities of dense links. Moreover, 
it has the desirable property (not shared by most other 
approaches) of asymptotic consistency under certain con- 
ditions [Tlj . meaning that if applied to networks that 
were themselves generated from the same blockmodel, 
the method can correctly recover the block structure. 

Unfortunately, however, the simple blockmodel de- 
scribed above does not work well in many applications 
to real-world networks. The model is not flexible enough 
to generate networks with structure even moderately sim- 
ilar to that found in most empirical network data, mean- 
ing that a posteriori fits to such data often give poor 
results 12]. Just as the fitting of a straight line to intrin- 
sically curved data is likely to miss important features of 
the data, so a fit of the simple stochastic blockmodel to 
the structure of a complex network is likely to miss much 
and, as we will show, can in some cases give radically 
incorrect answers. 

Attempts to overcome these problems by extending the 
blockmodel have focused particularly on the use of (more 
complicated) p* or exponential random graph models, 
but while these are conceptually appealing, they quickly 
lose the analytic tractability of the original blockmodel as 
their complexity increases. Other recent attempts to ex- 
tend blockmodels take the flavor of mixture models that 
allow vertices to participate in overlapping groups [l3| or 
to have mixed membership [Til [l5| . 

In this paper we adopt a different approach, consider- 
ing a simple and apparently minor extension of the classic 
stochastic blockmodel to include heterogeneity in the de- 
grees of vertices. Despite its innocuous appearance, this 
extension turns out to have substantial effects, as we will 
see. A number of previous authors have considered sim- 
ilar extensions of blockmodels. As early as 1987, Wang 
and Wong 16] proposed a stochastic blockmodel for di- 
rected simple graphs incorporating arbitrary expected in- 
and out-degrees, along with a selection of other features. 
Unfortunately, this model is not solvable for its param- 
eter values in closed form which limits its usefulness for 
the types of calculations we consider. Several more recent 
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works have also explored blockmodels with various forms 
of degree heterogeneity fTrM2lTj , motivated largely by the 
recent focus on degree distributions in the networks lit- 
erature. We note particularl y th e currently unpublished 
work of Patterson and Bader [21| , who apply a variational 
Bayes approach to a model close, though not identical, 
to the one considered here. 

In this paper we build upon the ideas of these au- 
thors but take a somewhat different tack, focusing on 
the question of why degree heterogeneity in blockmodels 
is a good idea. To study this question, we develop a 
degree-corrected blockmodel with closed-form parameter 
solutions, which allows us more directly to compare tra- 
ditional and degree-corrected models. As we show, the 
incorporation of degree heterogeneity in the stochastic 
blockmodel results in a model that in practice performs 
much better, giving significantly improved fits to network 
data, while being only slightly more complex than the 
simple model described above. Although we here exam- 
ine only the simplest version of this idea, the approaches 
we explore could in principle be incorporated into other 
blockmodels, such as the overlapping or mixed member- 
ship models. 

In outline, the paper is as follows. We first review 
the ideas behind the ordinary stochastic blockmodel to 
understand why degree heterogeneity causes problems. 
Then we introduce a degree-corrected version of the 
model and demonstrate its use in a posteriori blockmod- 
eling to infer group memberships in empirical network 
data, showing that the degree-corrected model outper- 
forms the original model both on actual networks and on 
new synthetic benchmarks. The benchmarks introduced, 
which generalize previous benchmarks for community de- 
tection, may also be of independent interest. 



II. STANDARD STOCHASTIC BLOCKMODEL 



tially no difference between the model described here and 
the standard blockmodel. 

With this in mind, the model we study is now defined 
as follows. Let G be an undirected multigraph on n ver- 
tices, possibly including self-edges, and let Aij be an el- 
ement of the adjacency matrix of the multigraph. Recall 
that the adjacency matrix for a multigraph is convention- 
ally defined such that is equal to the number of edges 
between vertices i and j when i ^ j, but the diagonal el- 
ement An is equal to twice the number of self-edges from 
i to itself (and hence is always an even number). 

We let the number of edges between each pair of ver- 
tices (or between a vertex and itself in the case of self- 
edges) be independently Poisson distributed and define 
uj rs to be the expected value of the adjacency matrix el- 
ement A^ for vertices i and j lying in groups r and s 
respectively. Note that this implies that the expected 
number of self-edges at a vertex in group r is ^Lu rr be- 
cause of the factor of two in the definition of the diagonal 
elements of the adjacency matrix. 

Now we can write the probability P(G\uj, g) of graph G 
given the parameters and group assignments as 



p(Gk. 9) =n ( ^ )Al 
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Given that Ay = Aji and Lo rs — Lj sr , Eq. (UJ) can after 
a small amount of manipulation be rewritten in the more 
convenient form 
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In this section we review briefly the use of the original, 
non-degree-corrected blockmodel, focusing on undirected 
networks since they are the most commonly studied. 

For consistency with the degree-corrected case we will 
allow our networks to contain both multi-edges and self- 
edges, even though many real- world networks have no 
such edges. Like most random graph models for sparse 
networks the incorporation of multi-edges and self-edges 
makes computations easier without affecting the funda- 
mental outcome significantly — typically their inclusion 
gives rise to corrections to the results that are of or- 
der 1/n and hence vanishing as the size n of the net- 
work becomes large. For networks with multi-edges, the 
previously-defined probability ip rs of an edge between 
vertices in groups r and s is replaced by the expected 
number of such edges, and the actual number of edges 
between any pair of vertices will be drawn from a Pois- 
son distribution with this mean. In the limit of a large 
sparse graph, where the probability of an edge and the 
expected number of edges become equal, there is essen- 



where n r is the number of vertices in group r and 



(3) 



which is the total number of edges between groups r and 
group s, or twice that number if r — s. 

Our goal is to maximize this probability with respect to 
the unknown model parameters Lu rs and the group assign- 
ments of the vertices. In most cases, it will in fact be sim- 
pler to maximize the logarithm of the probability (whose 
maximum is in the same place). Neglecting constants 
and terms independent of the parameters and group as- 
signments (i.e., independent of uj rs , n r , and m rs ), the 
logarithm is given by 



log P(GV, 5) 



(m rs logw rs - n r n s uj rs ). (4) 



We will maximize this expression in two stages, first 
with respect to the model parameters uj rs , then with 
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respect to the group assignments gi. The maximum- 
likelihood values Cj rs of the model parameters (where hat- 
ted variables indicate maximum-likelihood estimates) are 
found by simple differentiation to be 



^rs — j \yj 

and the value of Eq. Q at this maximum is 
logP(G\u),g) = J2rs m rs^og{m rs /n r n s ) - 2m, where 
m ~ \ Srs m rs is the total number of edges in the net- 
work. Dropping the final constant, we define the unnor- 
malized log- likelihood for the group assignment g: 

C(G\g)=Y, m ^ S—- ( 6 ) 

rs n r n s 

The maximum of this quantity with respect to the group 
assignments now tells us the most likely set of assign- 
ments [Ulllij]. In effect, Eq. ((6]) gives us an objective or 
quality function which is large for "good" group assign- 
ments and small for "poor" ones. Many such objective 
functions have been defined elsewhere in the literature on 
community detection and graph partitioning, but Eq. © 
differs from most other choices in being derived from first 
principles, rather than heuristically motivated or simply 
proposed ad hoc. 

Equation ([5]) has an interesting information-theoretic 
interpretation. By adding and dividing by constant fac- 
tors of the total number of vertices and edges the equa- 
tion can be written in the alternative form 

rtn\ \ \ " m ™ i m rs /2m 

£G.g = > — log -r-2, 7) 

^— ' 2m n r n s /n' s 

rs 

where again we have neglected irrelevant constants. Now 
imagine, for a given set of group assignments, that we 
choose an edge uniformly at random from our network, 
and let X be the group assignment at one (randomly se- 
lected) end of the edge and Y be the group assignment at 
the other end of the edge. The probability distribution 
of the variables X and Y is then pk(X = r,Y = s) = 
Pk{t, s) = mrs/Zm, which appears twice in the above ex- 
pression. The remaining terms in the denominator of the 
logarithm in (J7J are equal to the expected value of the 
same probability in a network with the same group as- 
signments but different edges, the edges now being placed 
completely at random without regard for the groups. Call 
this second distribution pi(r,s). Equation jTJ can then 
be written 

C(G\g)=J2PK(r, S )log^p^, (8) 
„ Pi{r,s) 

which is the well-known Kullback-Leibler divergence be- 
tween the probability distributions px and p\ [23] . 

The Kullback-Leibler divergence is not precisely a dis- 
tance measure since it's not symmetric in px and p%. 
However, if the logarithms are taken base 2 then it mea- 
sures the expected number of extra bits required to en- 
code X and Y if p\ is mistakenly used as the distribution 



for X and Y instead of the assumed true distribution px ■ 
So intuitively it can be considered as measuring how far 
px is from p\ . The most likely group assignments under 
the ordinary stochastic blockmodel are then those as- 
signments that require the most information to describe 
starting from a model that does not have group structure. 

This type of approach, in which one constructs an ob- 
jective function that measures the difference between an 
observed quantity and the expected value of the same 
quantity under an appropriate null model, is common in 
work on community detection in networks. One widely 
used objective function is the so-called modularity: 

Q{9) = ^jy A »- p ^9h9j), (9) 

ij 

where Aij is an element of the adjacency matrix and Py 
is the expected value of the same element under some 
null model. The null model assumed in our blockmodel 
calculation is one in which Py is constant. Making the 
same choice for the modularity would lead to 

x 

QCff) = r)- P i(r,r)]. ( 10 ) 

r=l 

The modularity, however, is not normally used this way 
and for good reason. This null model, corresponding to 
a multigraph version of the Erdos-Renyi random graph, 
produces highly unrealistic networks, even for networks 
with no community structure. Specifically, it produces 
networks with Poisson degree distributions, in stark con- 
trast to most real networks, which tend to have broad 
distributions of vertex degree. To avoid this problem, 
modularity is usually defined using a different null model 
that fixes the expected degree sequence to be the same 
as that of the observed network. Within this model 
Pij = kikj/2m where fc, is the degree of vertex i. Then 
the probability distribution over the group assignments 
at the end of a randomly chosen edge becomes 

Pdegrec(^ = T,Y = s) = Pdcgrcc(r, s) = — ^ (11) 

2m 2m 

where 

K r = ^ y m rs = ^ ^ ki8g i ^ r (12) 

s i 

is the total number of ends of edges, commonly called 
stubs, that emerge from vertices in group r, or equiva- 
lently the sum of the degrees of the vertices in group r. 
(Note that Eq. (TT2"|) correctly counts two stubs for edges 
that both start and end in group r.) Then the desired 
group assignments are given by the maximum of 

x 

= z2\pK(r,r) -Pdegree(r,r)]. (13) 
r=l 

This choice of null model is found to give significantly 
better results than the original uniform model because 
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it allows for the fact that vertices with high degree are, 
all other things being equal, more likely to be connected 
than those with low degree, simply because they have 
more edges. From an information-theoretic viewpoint, an 
edge between two high-degree vertices is less surprising 
than an edge between two low-degree vertices and we get 
better results if we incorporate this observation in our 
model. 

Returning to the stochastic blockmodel, using pi in- 
stead of ^degree m the objective function causes problems 
similar to those that affect the modularity. Fits to the 
model may incorrectly suggest that structure in the net- 
work due merely to the degree sequence is a result in- 
stead of group memberships. We will shortly see explicit 
real- world cases in which such incorrect conclusions arise. 
The solution to this problem, as with the modularity, is 
to define a stochastic blockmodel that directly incorpo- 
rates arbitrary heterogeneous degree distributions. 



form 
P(G\e,u,g) 



1 



l~L<^!rL2 A "/ 2 (^/2)! 

xWU^ l r /2 eM-^rs), (16) 



with ki being the degree of vertex i as previously and m rs 
defined as in Eq. As before, rather than maximizing 
this probability, it is more convenient to maximize its 
logarithm, which, ignoring constants, is 

log P(G\6, (j, g) = 2 ^2 lo S Oi + E("Vs log w rs - ui rs ). 

i rs 

Allowing for the constraint (I15p , the maximum-likelihood 
values of the parameters 9i and uj rs are then given by 



ft a 



CJ rs — TTi rSl 



(18) 



III. DEGREE-CORRECTED STOCHASTIC 
BLOCKMODEL 

In the degree-corrected blockmodel, the probability 
distribution over undirected multigraphs with self-edges 
(again denoted G) depends not only on the parameters 
introduced previously but also on a new set of parame- 
ters Ot controlling the expected degrees of vertices i. 

As before, we assume there are K groups, uj rs is a 
K x K symmetric matrix of parameters controlling edges 
between groups r and s, and gi is the group assignment 
of vertex i. As in the uncorrected blockmodel, let the 
numbers of edges each be drawn from a Poisson distri- 
bution, but now, following [2(3| and [2l|, let the expected 
value of the adjacency matrix element Aij be 9i9jUj gigj . 
Then graph G has probability 



A l3 \ 



11 (Aupy. 



Ai,/2 



■exp(-i6>fwg iffj ) 



(14) 



The 9 parameters are arbitrary to within a multiplicative 
constant which is absorbed into the to parameters. Their 
normalization can be fixed by imposing the constraint 



E 6i$gi,r — 1 



(15) 



where n r is the sum of the degrees in group r as be- 
fore (see Eq. ([L2j)). This maximum-likelihood parameter 
estimate has the appealing property of preserving the 
expected numbers of edges between groups and the ex- 
pected degree sequence of the network. To see this, let 
(x) be the average of x in the ensemble of graphs with 
parameters ()18|) . Then the expected number of edges 
between groups r and s is 

\Aij)5g it r5g^ s — > ~ 3gi,r8g jt s — m rs , 

ij 

(19) 

where we have made use of Eq. (|12[) . Similarly, the aver- 
age degree of vertex i in the ensemble is 



E<^> = E 

3 3 



3 

EE ~^ m 9^ r ^9^ r 
— ki. 



(20) 



Traditional blockmodcls, by contrast, preserve only the 
expected value of the matrix m rs and not the expected 
degree — every vertex in group r in the traditional block- 
model has the same expected degree • m rigj / (n r n gj ) = 

Substituting Eq. (fT5|) into Eq. (IT71) . the maximum of 
log P(G\9, oj, g) for the degree-corrected blockmodel is 



for all groups r, which makes 0i equal to the probability 
that an edge connected to the community to which i be- 
longs lands on i itself. With this constraint, the probabil- 
ity P(G\9,uj, g) can be simplified to the more convenient 



log P(G\9, oo, g) — 2 fcj log — — + m rs log m rs -2m. 

l ^ rs 

(21) 

where as before m is the total number of edges in the net- 
work. The first term in this expression can be rewritten 
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as 

2 hi log — = 2 V" ki log fcj - 2 V" V" M Si ,r log K r 

% ^ i % r 

= 2 ^ h log ki - ^ k,. log K r - 22 k s log K s 

i r s 

= 2 } kj log fcj - )] m rs log w r w 8 , (22) 

t rs 

where we have again made use of Eq. (JT5J). Substituting 
back into Eq. (j2Tj) and dropping overall constants then 
gives us an unnormalized log-likelihood function of 

C(G\g) = Y / m r s\og^. (23) 

rs 

Notice that the only difference between this degree- 
corrected log-likclihood and the uncorrected log- 
likelihood of Eq. is the replacement of the number n r 
of vertices in each group by the number n r of stubs. Mi- 
nor though this replacement may seem, however, it has 
a big effect, as we will shortly see. 

As before, we can interpret the optimization of the ob- 
jective function (l23|) through the lens of information the- 
ory. Adding and multiplying by constant factors allows 
us to write the log-likelihood in the form 

rin\ \ m rs , m rs /2m 

which is the Kullback-Leibler divergence between px 
and Pdegrcc- Alternatively, noting that ^degree is the 
product of the marginal distributions ^2 r PK(r, s) ano - 
E s P^( r i s )' t ms particular form of divergence can also 
be thought of as the mutual information of the random 
variables representing the group labels at either end of 
a randomly chosen edge. Loosely speaking, the best fit 
to the degree-corrected stochastic blockmodel gives the 
group assignment that is most surprising compared to the 
null model with given expected degree sequence, whereas 
the ordinary stochastic blockmodel gives the group as- 
signment that is most surprising compared to the Erdos- 
Renyi random graph. 



Information-theoretic quantities have been proposed 
previously as possible objective functions for community 
detection or clustering. Dhillon et al. [24j . for instance, 
used mutual information as an objective function for clus- 
tering bipartite graphs, as part of an approach they call 
"information-theoretic co-clustering." Equation (|2"3"|) is 
also somewhat reminiscent of an objective function of 
Reichardt et al. [1 81 ] which, if translated into our termi- 
nology and adapted to undirected networks, is equivalent 
to the total variation distance between px and ^degree, 
variation distance being an alternative measure of the 
distance between two probability distributions. While 
the variation distance and the Kullback-Leibler diver- 
gence are related, both falling in the class of so-called /- 
divergences, the optimization of variation distance does 
not, to our knowledge, correspond to maximizing the like- 
lihood of any generative model, and there are significant 
benefits to the connection with generative models. In 
particular, one can easily create networks from the en- 
semble of our model and in addition the connection to 
generative processes means that a posteriori blockmodel- 
ing fits into standard frameworks for statistical inference, 
which are well studied and understood in other contexts. 

Equation ([23f could also be used as a measure of as- 
sortative mixing among discrete vertex characteristics in 
networks (2f| [2(|. In a network such as a social net- 
work, where connections between individuals can depend 
on characteristics such as nationality, race, or gender, our 
objective function could be used, for instance, to quantify 
which of several such characteristics is more predictive of 
network structure. 

A useful property of the objective function in Eq. (f23|) 
when used for a posteriori blockmodeling is that it is pos- 
sible to quickly compute the change in the log-likelihood 
when a single vertex switches groups. When a vertex 
changes groups from r to s only K r , K s , m rt , and m s t 
(for any t) can change (with m rs symmetric). This 
means that many terms cancel out of the difference of 
log-likelihoods and can be ignored in the computations. 



I 

Consider moving vertex i from community r to community s. Let ku be the number of edges from vertex i to 
vertices in group t excluding self-edges, and let Ui be the number of self-edges for vertex i. These quantities are the 
same for all possible moves of vertex i. Define a(x) — 2x\ogx and b(x) — xlogx where a(0) = and b(0) — 0. Then 
the change in the log-likelihood can be written: 

AC = [ a ( m rt + ku) - a(m rt ) + a(m st + kit) - a(m st )] + a(m rs + k ir ~ k is ) - a(m rs ) 

+ b(m rr - 2(kir + Ui)) - b(m rr ) + b(m ss + 2{k is + m)) - b(m ss ) — a(n r - h) + a(n r ) — a(n s + kj) + a(n s ). 

(25) 



This quantity can be evaluated in time 0(K + (k)) on average and finding the s that gives the maximum AC for 
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given i and r can thus be done in time 0(K(K + (k))). 
Because these computations can be done quickly for a 
reasonable number of communities, local vertex switch- 
ing algorithms, such as single- vertex Monte Carlo, can be 
implemented easily. Monte Carlo, however, is slow, and 
we have found competitive results using a local heuristic 
algorithm similar in spirit to the Kernighan-Lin algo- 
rithm used in minimum-cut graph partitioning [27j . 

Briefly, in this algorithm we divide the network into 
some initial set of K communities at random. Then we 
repeatedly move a vertex from one group to another, se- 
lecting at each step the move that will most increase the 
objective function — or least decrease it if no increase is 
possible — subject to the restriction that each vertex may 
be moved only once. When all vertices have been moved, 
we inspect the states through which the system passed 
from start to end of the procedure, select the one with the 
highest objective score, and use this state as the starting 
point for a new iteration of the same procedure. When 
a complete such iteration passes without any increase in 
the objective function, the algorithm ends. As with many 
deterministic algorithms, we have found it helpful to run 
the calculation with several different random initial con- 
ditions and take the best result over all runs. 




(a) Without degree correction 




(b) With degree-correction 



IV. RESULTS 

We have tested the performance of the degree- 
corrected and uncorrected blockmodels in applications 
both to real-world networks with known community as- 
signments and to a range of synthetic (i.e., computer- 
generated) networks. We evaluate performance by quan- 
titative comparison of the community assignments found 
by the algorithms and the known assignments. As a met- 
ric for comparison we use the normalized mutual infor- 
mation, which is denned as follows [7|. Let n rs be the 
number of vertices in community r in the inferred group 
assignment and in community s in the true assignment. 
Then define p(X = r,Y = s) = n rs /n to be the joint 
probability that a randomly selected vertex is in r in the 
inferred assignment and s in the true assignment. Using 
this joint probability over the random variables X and 
Y , the normalized mutual information is 



NMI(X, Y) 



2 MI(X, Y) 
H(X) + H(Y)' 



(26) 



where MI(X, Y) is the mutual information and H(Z) is 
the entropy of random variable Z. The normalized mu- 
tual information measures the similarity of the two com- 
munity assignments and takes a value of one if the as- 
signments are identical and zero if they are uncorrelated. 
A discussion of this and other measures can be found in 
Ref. [H. 



FIG. 1: Divisions of the karate club network found using the 
(a) uncorrected and (b) corrected blockmodels. The size of a 
vertex is proportional to its degree and vertex color reflects 
inferred group membership. The dashed line indicates the 
split observed in real life. 



A. Empirical networks 

We have tested our algorithms on real- world networks 
ranging in size from tens to tens of thousands of ver- 
tices. In networks with highly homogeneous degree distri- 
butions we find little difference in performance between 
the degree-corrected and uncorrected blockmodels, which 
is expected since for networks with uniform degrees the 
two models have the same likelihood up to an additive 
constant. Our primary concern, therefore, is with net- 
works that have heterogeneous degree distributions, and 
we here give two examples that show the effects of het- 
erogeneity clearly. 

The first example, widely studied in the field, is the 
"karate club" network of Zachary [2{|. This is a social 
network representing friendship patterns between the 34 
members of a karate club at a US university. The club 
in question is known to have split into two different fac- 
tions as a result of an internal dispute, and the members 
of each faction are known. It has been demonstrated 
that the factions can be extracted from a knowledge 
of the complete network by many community detection 
methods. 

Applying our inference algorithms to this network, us- 
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ing corrected and uncorrected blockmodels with K = 2, 
we find the results shown in Fig. [TJ As pointed out also 
by other authors [H|,[3(|, the non-degree-corrected block- 
model fails to split the network into the known factions 
(indicated by the dashed line in the figure) , instead split- 
ting it into a group composed of high-degree vertices and 
another of low. The degree-corrected model, on the other 
hand, splits the vertices according to the known commu- 
nities, except for the misidentification of one vertex on 
the boundary of the two groups. (The same vertex is also 
misplaced by a number of other commonly used commu- 
nity detection algorithms.) 

The failure of the uncorrected model in this context 
is precisely because it does not take the degree sequence 
into account. The a priori probability of an edge be- 
tween two vertices varies as the product of their degrees, 
a variation that can be fit by the uncorrected blockmodel 
if we divide the network into high- and low-degree groups. 
Given that we have only one set of groups to assign, how- 
ever, we are obliged to choose between this fit and the 
true community structure. In the present case it turns 
out that the division into high and low degrees gives the 
higher likelihood and so it is this division that the algo- 
rithm returns. In the degree-corrected blockmodel, by 
contrast, the variation of edge probability with degree is 
already included in the functional form of the likelihood, 
which frees up the block structure for fitting to the true 
communities. 

Moreover it is apparent that this behavior is not lim- 
ited to the case K = 2. For K = 3, the ordinary 
stochastic blockmodel will, for sufficiently heterogeneous 
degrees, be biased towards splitting into three groups by 
degree — high, medium, and low — and similarly for higher 
values of K. It is of course possible that the true com- 
munity structure itself corresponds entirely or mainly to 
groups of high and low degree, but we only want our 
model to find this structure if it is still statistically sur- 
prising once we know about the degree sequence, and this 
is precisely what the corrected model does. 

As a second real-world example we show in Fig. [2] an 
application to a network of political blogs assembled by 
Adamic and Glance [3l|. This network is composed of 
blogs (i.e., personal or group web diaries) about US pol- 
itics and the web links between them, as captured on 
a single day in 2005. The blogs have known political 
leanings and were labeled by Adamic and Glance as ei- 
ther liberal or conservative in the data set. We consider 
the network in undirected form and examine only the 
largest connected component, which has 1222 vertices. 
Figure [2] shows that, as with the karate club, the uncor- 
rected stochastic blockmodel splits the vertices into high- 
and low-degree groups, while the degree-corrected model 
finds a split more aligned with the political division of 
the network. While not matching the known labeling ex- 
actly, the split generated by the degree-corrected model 
has a normalized mutual information of 0.72 with the la- 
beling of Adamic and Glance, compared with 0.0001 for 
the uncorrected model. 




(a) Without degree-correction 




(b) With degree-correction 



FIG. 2: Divisions of the political blog network found using the 
(a) uncorrected and (b) corrected blockmodels. The size of a 
vertex is proportional to its degree and vertex color reflects 
inferred group membership. The division in (b) corresponds 
roughly to the division between liberal and conservative blogs 
given in [3ll |. 

(To make sure that these results were not due to a fail- 
ure of the heuristic optimization scheme, we also checked 
that the group assignments found by the heuristic have a 
higher objective score than the known group assignments, 
and that using the known assignments as the initial con- 
dition for the optimization recovers the same group as- 
signments as found with random initial conditions.) 



B. Generation of synthetic networks 

We turn now to synthetic networks. The networks we 
use are themselves generated from the degree-corrected 
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stochastic blockmodel, which is ideally designed to play 
exactly this role. (Indeed, though it is not the primary 
focus of this article, we believe that the blockmodel may 
in general be of use as a source of flexible and challeng- 
ing benchmark networks for testing the performance of 
community detection strategies.) 

In order to generate networks we must first choose the 
values of g, oj, and 9. The group assignments g can be 
chosen in any way we please, and we can also choose 
freely the values for the expected degrees of all vertices, 
which fixes the 9 variables according to Eq. (fT8"|) . Choos- 
ing the values of u) rs involves a little more work. In prin- 
ciple, any set of nonnegative values is acceptable provided 
it is symmetric in r and s and satisfies ^ s with 
K r as in Eq. (TT2"j) . However, because we wish to be able 
to vary the level of community structure in our networks 
we choose uj rs in the present case to have the particular 
form 



uj rs = \ojfr tcd + (i - ar 



random 



(27) 



This form allows us to interpolate linearly between the 
values w,P] antcd and w™ dom using the parameter A. The 
^random re p resen ts a fully random network with no group 
structure; it is defined to be the expected value of u rs in 
a random graph with fixed expected degrees [32| . which 
is simply w™ ndom = K r K s /2m. 

The value of ^planted by contrast is chosen to create 
group structure. A simple example with four groups is: 



planted 



'/ti 

k 2 

k 3 o 

K 4 y 



(28) 



With this choice, all edges will be placed within com- 
munities when A = 1 and none between communities. 
When A = 0, on the other hand, all edges will be placed 
randomly, conditioned on the degree sequence, and for in- 
termediate values of A we interpolate between these two 
extremes in a controlled fashion. (This model is similar to 
the benchmark network ensemble previously proposed by 
Lancichinetti [33[ — roughly speaking it is the "canonical 
ensemble" version of the "microcanonical" model in [33j].) 

More complicated choices of u;P' anted are also possible. 
Examples include the core-periphery structure 



planted 



Kl - H 2 K 2 
K 2 



(29) 



where k\ > n 2 . In the case where k,\ ~ k 2 this choice 
also generates approximately bipartite networks, where 
most edges run between the two groups and few lie inside. 
Another possibility is a hierarchical structure of the form 



planted 



ki-A A 
A k 2 - A 
K 3 , 



(30) 



where A < min(Ki, n 2 ) 



In mixed models such as these, each edge in effect has a 
probability A of being chosen from the planted structure 
and 1 — A of being chosen from the null model. Among the 
edges attached to a given vertex, the expected fraction 
drawn from the planted structure is A and the remainder 
are drawn randomly. 

Once we have chosen our values for g, 9, and lu, 
the network generation itself is a straightforward imple- 
mentation of the blockmodel: we first draw a Poisson- 
distributed number of edges for each pair of groups r, s 
with mean Lu rs (or \ui rs when r — s), then we assign each 
end of an edge to a vertex in the appropriate group with 
probability 0j. 



C. Performance on synthetic networks 

There are two primary considerations in comparing 
the degree-corrected and uncorrected blockmodels on our 
synthetic benchmark networks. The first is how close 
the group assignments found in our calculations are to 
the planted group assignments. The second is the per- 
formance of the heuristic optimization algorithm. It is 
possible that the maximum-likelihood group assignment 
may be close to the true group assignment but that our 
heuristic is unable to find it. And if the heuristic per- 
forms better in general for either the corrected or un- 
corrected blockmodel it may make comparisons between 
the models unreliable: we want to claim that the degree- 
corrected model gives better results than the uncorrected 
version because it has a better objective function for het- 
erogeneous networks and not because we used a biased 
optimization algorithm. 

To shed light on these questions we take the following 
approach. For both the degree-corrected model and the 
uncorrected model we perform tests with random initial 
conditions and with initial conditions equal to the known 
planted group structure. The latter (planted) initializa- 
tions tell us whether the planted group assignment, or 
something close to it, is a local optimum of the respec- 
tive objective function — if it is, our heuristic should find 
that optimum most of the time and return a final assign- 
ment similar to the planted one. This should be true for 
essentially any reasonable heuristic, even a biased one, 
since the heuristic will be making only minimal changes 
to the group assignments (or none at all). 

For small values of A we expect that the planted as- 
signment is not near a local maximum, but for large A we 
would hope that it is. Thus, if we discover in the process 
of running our heuristic that it is not, it strongly sug- 
gests we have made a poor choice of objective function 
(and this conclusion should hold even if the heuristic is 
biased). 

The results of such tests on our synthetic networks are 
shown in Fig. [3] We plot the normalized mutual infor- 
mation as a function of A for various choices of planted 
structure. Each data point represents an average over 30 
networks of size n — 1000 for both the degree-corrected 
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X 



FIG. 3: The average normalized mutual information as a function of A for the three synthetic tests described in the text. 
Filled squares and transparent circles indicate tests initialized with planted and random assignments respectively. Green points 
denote results for the degree-corrected blockmodel and black for the ordinary uncorrected model. The left, middle, and right 
panels show respectively the results for the two-group two-degree networks, core-periphery networks, and hierarchical networks. 
The error bars indicate the standard error on the mean computed from 30 networks per data point. 



and uncorrected objective functions. In the case of ran- 
dom initializations, ten initializations were performed for 
each network and we take the best result among the ten. 

The left panel in the figure shows results for networks 
with two communities and just two possible values of the 
expected degree, 10 and 30. Each of the 1000 vertices 
was assigned to one of the four possible combinations of 
degree and community with equal probability, and the 
planted structure was chosen diagonal, as in Eq. ([25)1 . 

The green points in the figure indicate the perfor- 
mance of the degree-corrected blockmodel, while the 
black points are for the uncorrected model. Solid squares 
and open circles show performance starting from the 
planted community structure and random assignments 
respectively. Bearing in mind that A = corresponds to 
zero planted structure (in which case neither algorithm 
should find any significant result) and that a normal- 
ized mutual information approaching 1 indicates success- 
ful detection of the planted structure, we can see from 
the figure that the degree-corrected blockmodel signifi- 
cantly out-performs the uncorrected one in this simple 
test. As A increases from zero, the mutual information 
for all algorithms rises, but the corrected model starts to 
detect some signatures of the planted structure almost 
immediately and for A = ^ returns a normalized mutual 
information above 0.7 for both initial conditions. The un- 
corrected model, by contrast, finds no planted structure 
at A = \ for either initialization — including when the al- 
gorithm is initialized to the known correct answer. The 
reason for this poor performance is precisely because of 
the variation in degrees: for values of A up to around 0.6 
the uncorrected model fits these networks better if ver- 
tices are assigned to groups according to their degree than 
if they are assigned according to the planted structure, 
and hence the best-fit group structure has no correlation 
with the planted structure. 



We have also tested our blockmodels against syn- 
thetic networks with two other types of structure, one 
the core-periphery or approximately bipartite structure 
of Eq. (|29|) and the other the hierarchical structure of 
Eq. (|30|) . In these examples we use a more realistic de- 
gree distribution that approximately follows a power law 
with a minimum expected degree of 10 and an exponent 
of —2.5. For the core-periphery networks we randomly 
assign vertices to one of the two groups, while for the 
hierarchical networks we fix 500 vertices to be in the 
first of the three groups, put the rest randomly in the 
other two, and set A — imin(Ki, k-i). (It has been sug- 
gested that choosing non-equal sizes for groups in this 
way presents a harder challenge for structure detection 
algorithms [3(3, ESt) 

The performance of our blockmodels on these two 
classes of networks is shown in the middle (core- 
periphery) and right (hierarchical) panels of Fig. [3] 
Again we see that the normalized mutual information in- 
creases with increasing A for all algorithms but that the 
degree-corrected blockmodel performs significantly bet- 
ter than the uncorrected model. The degree-corrected 
model with planted assignments consistently does the 
best among the four options as we would expect, and 
the degree-corrected model with random initializations 
performs respectably in all cases, although it's entirely 
possible that better performance could be obtained with 
a better optimization strategy. The performance of 
the uncorrected model with random initializations, on 
the other hand, is quite poor [35]. But perhaps the 
most telling comparison is the one between the degree- 
corrected model with random initial assignments and the 
uncorrected model with the planted assignment. This 
comparison tilts the playing field heavily in favor of the 
uncorrected model and yet, as Fig. [3] shows, the degree- 
corrected model still performs about as well as, and in 
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some cases better than, the uncorrected model. 

V. CONCLUSIONS 

In this paper, we have studied how one can incorpo- 
rate heterogeneous vertex degrees into stochastic block- 
models in a simple way, improving the performance of the 
models for statistical inference of group structure. The 
resulting degree-corrected blockmodels can also be used 
as generative models for creating benchmark networks, 
retaining the generality and tractability of other block- 
models while producing degree sequences closer to those 
of real networks. 

We have found the performance of the degree-corrected 
model for inference of group structure to be quantita- 
tively better on both synthetic and real-world test net- 
works than the uncorrected model. In networks with 
substantial degree heterogeneity, the uncorrected model 
prefers to split networks into groups of high and low de- 
gree, and this preference can prevent it from finding the 
true group memberships. The degree-corrected model 
correctly ignores divisions based solely on degree and 
hence is more sensitive to underlying structure. 

It seems likely that other more sophisticated block- 
models, such as the recently proposed overlapping and 
mixed membership models, would benefit from incor- 
porating degree sequences also. In applications to on- 
line social network data, for example, where overlapping 
groups are common, there is frequently substantial de- 
gree heterogeneity and hence potentially significant ben- 
efits to using a degree-corrected model. 

The degree-corrected blockmodel is not without its 
faults. For instance, the model as described can produce 
an unrealistic number of zero-degree vertices, and is also 
unable to model some degree sequences, such as those in 
which certain values of the degree are entirely forbidden. 
As a model of real-world networks, it may also fail to 
accurately represent higher-order network structure such 
as overrepresented network motifs or degree correlations. 
From a statistical point of view, it is also somewhat un- 
satisfactory that the number of parameters in the model 



scales with the size of the network, which for example 
prevents fits to a network of one size being used to gen- 
erate synthetic networks of another size. 

But perhaps the chief current drawback of the model 
is that the number K of blocks or groups in the network 
is assumed given. In most structure detection problems 
the number of groups is not known and a complete calcu- 
lation will therefore require not only the algorithms de- 
scribed in this paper but also a method for estimating K. 
Some previously suggested approaches to this problem in- 
clude cross-validation minimum description length 
methods using two-part or universal codes (30j . max- 
imization of a marginal likelihood [10], and nonpara- 
metric Bayesian methods. The marginal likelihood for 
our degree-corrected blockmodel can be computed explic- 
itly if one assumes conjugate priors on the parameters — 
Dirichlet for 9 and gamma for uj — but then one must 
also choose the parameters of those priors, called hyper- 
parameters in the statistical literature. In principle one 
wants to choose values of the hyperparameters that pro- 
vide asymptotic consistency — the blockmodel should re- 
turn the correct number of groups when applied to a 
network generated from the same blockmodel, at least 
in certain limits. At present, however, it is not known 
how to make this choice. An alternative possibility is 
to note that the blockmodel used here is equivalent to a 
model that generates an ensemble of matrices with inte- 
ger entries, implying potential connections to the large 
statistical literature on contingency table analysis that 
could be helpful in determining the number of groups in 
a principled fashion. We leave these questions for future 
work. 
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